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ABSTRACT 



Data mining system including a user interface 102, a plu- 
rality of data sources 114, at least one top-down data 
analysis module 104 and at least one bottom-up data analy- 
sis module 104' in cooperative communication with each 
other and with the user interface 102, and a server processor 
106 in communication with the data sources 114 and with 
the data analysis modules 104, 104'. Data mining method 
involving the integration of top-down and bottom-up data 
mining techniques to extract 208 predictive models from a 
data source 114. A data source 114 is selected 200 and used 
to construct 202 a target data set 108. A data analysis module 
is selected 203 and module specific parameters are set 205. 
The selected data analysis module is applied 206 to the 
target data set based on the set parameters. Finally, predic- 
tive models arc extracted 208 based on the target data set 
108. 

11 Claims, 5 Drawing Sheets 
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METHOD FOR GENERATING PREDICTIVE 
MODELS IN A COMPUTER SYSTEM 

This application is a continuation of Scr. No. 08/213, 19i 
filed Mar. 15, 1994. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates generally to the field of data 
mining systems used to retrieve data from one or more 
designated databases, and relates more specifically to a 
system for extracting patterns and relations from data stored 
in databases to generate predictive models. 

2. Description of Related Art 

Accurate forecasting relies heavily upon the ability to 
analyze large amounts of data. This task is extremely diffi- 
cult because of the sheer quantity of data involved and the 
complexity of the analyses that must be performed. The 
problem is exacerbated by the fact that the data often resides 
in multiple databases, each database having different inter- 
nal file structures. 

Rarely is the relevant information explicitly stored in the 
databases. Rather, the important Marmation exists only in 
the bidden relationships among items in the databases. 
Recently, artificial intelligence techniques have been 
employed to assist users in discovering these relationships 
and, in some cases, in automatically discovering the rela- 
tionships. 

Data mining is a process that uses specific techniques to 
find patterns in data, allowing a user to conduct a relatively 
broad search of large databases for relevant information that 
may not be explicitly stored in the databases. Typically, a 
user initially specifies a search phrase or strategy and the 
system then extracts patterns and relations corresponding to 
that strategy from the stored data. These extracted patterns 
and relations cao be; (1) used by the user, or data analyst, to 
form a prediction model; (2) used to refine an existing 
model; and/or (3) organized into a summary of the target 
database. Such a search system permits searching across 
multiple databases. 

There are two existing forms of data mining: top-down; 
and bottom-up. Both forms are separately available on 
existing systems. Top-down systems are also referred to as 
"pattern validation," "verification-driven data mining" and 
"confirmatory analysis." This is a type of analysis that 
allows an analyst to express a piece of knowledge, validate 
or validate that knowledge, and obtain the reasons for the 
validation oor invalidation. The validation step in a top- 
down analysis requires that data refuting the knowledge as 
well as data supporting the knowledge be considered. 
Bottom-up systems are also referred to as "data exploration 

Bottom-up systems discover knowledge, generally in the 
form of patterns, in data. Existing systems rely on the 
specific interface associated with each database, which fur- 
ther limits a user's ability to dynamically interact with the 
system to create sets of rules and hypotheses than can be 
applied across several databases, each having separate struc- 
tures. For large data problems, a single interface and single 
data mining technique significantly inhibits a user's ability 
to identify all appropriate patterns and relations. The goal of 
performing such data mining is to generate a reliable pre- 
dictive model that can be applied to data sets. 

Furthermore, existing systems require the user to collect 
and appropriately configure the relevant data, frequently 
from multiple and diverse data sources. Little or no guidance 
or support for this task is produced. 
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Thus, there remains a need for a system that permits a user 
to create a reliable predictive model using data mining 
across multiple and diverse databases. 

SUMMARY OF THE INVENTION 

The present invention involves a data mining system and 
method used to generate predictive models. The method 
involves the use of a computer system having a user inter- 
face 102, a plurality of data sources 114, such as databases, 
a server processor 106, at least one top-down data analysis 

10 module 104, and at least one bottom-up data analysis 
module 104*. The server processor 106 is in communication 
with the data sources 114, and with the data analysis 
modules 104, 105. The data analysis modules 104, 105 
interact between the user interface 102 and the server 

15 processor 106. 

The inventive method generally involves the integration 
of top-down and bottom-up data mining to generate predic- 
tive models. A first step involves selecting 200 data from the 
data sources 114. A target data set 108, which may be a 

20 single one of the data sources 114 or a subset of data selected 
from one or more of the data sources 114, is constructed 202, 
The user selects 203 a data analysis module, then the 
processor 106 generates 204 module-specific data files and 
specification. A predictive model is extracted 208 using the 

25 selected one of the data analysis modules and based on the 
target data set The predictive models finally may be stored 
209 in a repository 110 for future use. 

In one embodiment, a series of user query phrases, which 
may be in the form of concept definitions, identified goal 
attributes, hypotheses, a search term, search strategy, and the 
like, are defined and validated against the target data set 108. 
The validated query phrases then are stored and selectively 
directed to a selected one of the bottom-up data analysis 
modules 104* using the server processor 106 for bottom-up 
processing. A predictive model, based on a set of generated 

35 rules, is extracted by the selected data analysis module 104, 
104' based on the target data set 108 and the validated quay 
phrases. 

In subsequent uses of the inventive method, models stored 
in the repository 110 may be used to analyze and make 

40 predictions about new data. Several data analysis modules 
104, 104' may be used to aid in the formulation and 
validation of query phrases. For example, data generated by 
the deductive processing module may be presented by the 
visualization module to make certain relationships within 

45 the data more apparent Other modules that may be used in 
practicing the inventive method include clustering, case- 
based reasoning, inductive learning, and statistical analysis. 

The present invention also includes a system incorporat- 
ing and embodying the same functions and features 

SO described above. 

BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 is a general block diagram depicting the working 
environment of the present inventive method. 
55 FIG. 2 is a flow diagram of an embodiment of the present 
invention. 

FIG. 3 is a flow diagram of an alternative embodiment of 
the present invention. 

FIG. 4 is a flow diagram of an alternative embodiment of 
60 the present invention. 

FIG. 5 is a flow diagram of an alternative embodiment of 
the present invention. 

DETAILED DESCRIPTION OF PREFERRED 
65 EMBODIMENTS 

The present invention is a data mining method and system 
used to generate predictive models that may be applied 
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against various data sources, such as databases. The method modules 104, 104' are in communication with each other, the 
involves the application of top-down and bottom-up data server 106, the repository 110, and the target data set 108. 
mining techniques in a modular system. The knowledge repository 110 is an accessible repository 

An exemplary environment for the present system is for the output of the present system that is stored on a storage 
shown in FIG. 1. That system includes a graphical user 5 medium and retrieved to a memory register when in use. The 
interface (GUI) 102 through which the user interacts with mined knowledge, including predictive models and vali- 
the system in generating the predictive models. This first dated queries, and all user-provided domain knowledge may 
GUI 102 is associated with a command and data preparation be stored in the repository 110. The repository 110 is 
module 103 that enables the user to generate the initial data accessible by the servo- 106 which provides information 
mining request. The top GUI 102 is used for perfcrrning 10 from the repository 110 to the various modules 104, 104'. By 
steps 200, 202, 203, and 204. The data preparation module placing all of this information in a single repository 110, (he 
103 is in communication with a plurality of data analysis user-defined knowledge, which typically consists of high- 
modules 104 and 104', each of which may include an level concepts and interrelations among attributes and 
associated GUI 105 or 105', respectively. among values in the target data set 108, may be shared 

The illustrated system of FIG. 1 includes several different 15 among several modules and users, 
modules 104, 104* each of which is a different data mining The target data set 108 typically represents a subset of a 
component that implements a different data mining tech- larger underlying data source 114 extracted by the user. The 
nique. Alternatively, it may be possible for a system to data in the target data set 108 may be compiled from data 
include a single top-down data analysis module 104 and a sources 114 having different formats. For example, and as 
single bottom-up data analysts module 104', as described in 20 illustrated in FIG. 1, the data source may be formatted as a 
further detail below. Exemplary modules 104, 104' include database, a spreadsheet, a flat file, or another format type, 
deductive database processing, inductive learning, The server 106 is responsible for transforming the target 
clustering, case-based reasoning, visualization, and statisti- database 108 to the necessary formats, for filling in missing 
cal analysis. Modules 104, 104' may be added or omitted values if necessary, and for locally m a i ntaini ng the txans- 
from a particular system, as required by the user. 25 formed data. 

The modules 104, 104* may be custom designed for Thus, the server 106 also conimunicates with the various 
specific applications, or generally commercially available, data sources 114. Typically, each data source 114 is a 
For example, an inductive learning module 104' is available database having an associated database management system 
from Reduct Systems, Inc. (Regina, Canada) under the name J0 (DBMS) 112. However, it is possible to have data sources 
Datalogic-R . That module creates rales from a data set that 114 which do not include an associated DBMS 112, for 
is included in a flat file. An exemplary visualization module example spreadsheets and flat files. In such an instance, the 
is available under the name PV-Wave, from Visual Numerics server 106 acts as the translator between the output of the 
(Colorado Springs, Co.). That product is a visualization tool modules 104 and the data sources 114. 
mat creates a variety of visualizations from data that is 3J Turning now to the inventive method, as shown generally 
stored in flat files. The only commercially available deduc- in the flow diagram of FIG. 2, a data source 114 is selected 
five database processor module is that contained in RECON, 200 and input into the system. The user may direct explo- 
available from Lockheed Martin Missiles & Space, ration 201 of an idea, such as a query or hypothesis, in the 
Sunnyvale, Calif. That module interfaces with relational data source 114 before constructing 202 a target data set 108 
databases and allows its user to graphically formulate ^ based on the selected 200 data source 114. The data source 
queries, concepts, and rules. 114 preferably is a database or collection of databases, but 

Some of the modules 104, for example the deductive may include a spreadsheet or flat files, 
database processor and the case-based reasoning modules, The user selects 203 a data analysis module 104 to 
typically arc used for top-down mining. Other modules 104 1 , perform data mining. Module-specific data files and a data 
for example inductive learning, conceptual clustering, and 45 fije specification are generated 204 and stored. Module? 
data visualization, art used for performing bottom-up rain- specific parameters are set 205 using the GUI 105 of the 
ing. The inventive system includes at least one top-down selected data analysis module, which may be in the form of 
module 104 and at least one bottom-up module 104'. user queries or hypotheses. The selected data analysis mod- 

The modules 104, 104' are in cooperation and in com- ule then is applied 206 to the target data set 108, and the 
munication with each other. Information and data may be 50 results are returned to the user for examination 207 via the 
shared among the modules to extract data from identified module GUI 105. Once the user determines that the raining 
data sources 114 based on user-defined Input, such as results are satisfactory, a predictive model is extracted 208 
queries. The modules 104, 104' also are in direct commu- based on such results. Depending on the specific application, 
ni cation with a server processor 106. One function of the the predictive models may include a collection of rules for 
server processor 106 is to convert attributes and character- 35 symbolic models, a set of equations for statistical models, a 
is tics of a selected data source 114 to those expected by the trained neural network for neural models, and the like. The 
selected module 104, 104'. Thus, a type of impedance extracted predictive model men may be saved 209, for 
matching is performed by the server processor 106 when- example in the knowledge repository 110. 
ever a module is added to the system to transform the data In an alternative embodiment, and as illustrated in FIG. 3, 
from the data source to conform with the expected format of go once the data source is selected 200 and the target data set 
the selected module. is constructed 202, a query phrase is defined 210 using the 

The system is built on a distributed client/server deductive database module 104. A query phrase is a plain 
architecture, wherein each data analysis module 104, 104' is language query or request, such as **what is the return on 
a client to the server 106. The server 106 accesses and investment for ... 7". The query phrase is the basis for 
maintains a line to a target database 108 and a knowledge 63 pattern validation, or top-down mining, since the user, 
repository 110 and functions to generate specifications for through the user interface 105 of the deductive database 
describing the mined data. Thus, at least indirectly, the module 104, may graphically express a pattern in the form 
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of an "if . . . then ..." fana This form typically is referred 
to a hypothesis. In one example, the target data set 108 is 
identified through user interactions with the visualization 
module 104'. 

The server 106 receives the query phrase, then tests the 5 
query against the data in the target data set 108 to validate 
211 the query. Data is retrieved from the target data set 108 
responsive to the query. In one embodiment, a hypothesis is 
posed and data is retrieved that supports and data that refutes 
the hypothesis. The server 106 then reports 210 the data back "> 
to the deductive database module 104 and to the user via the 
user interface device 105. The user may decide whether the 
data supports the query, and whether the query should be 
considered validated at that point. The validated query may 
be stored in the repository 110, together with other validated 15 
queries. 

It is possible that one module 104 is used to define 210 the 
query phrase while another module 104 is used to present the 
retrieved dat a to the user. For example, the deductive 
database processor module 104 may be used to define 210 a 20 
query phrase in the farm of a hypothesis. The server then 
validates 211 the query against the target data set and returns 
the retrieved data in graphical form using a visualization 
module 104, 

lii addition to the patterns being proposed by the user, (he 25 
target data set 108 may support additional important patterns 
that could be identified only by intelligently exploring its 
contents. Data exploration, or bottom-up mining, results in 
the automatic generation of several patterns, or rules. The 
present invention incorporates this bottom-up technique in 30 
its data mining approach. 

The validated query is directed 212 by the server proces- 
sor 106 to one of the bottom-up data analysis modules 104. 
The module then extracts 214 a set of rules using tools 35 
commercially available. Preferably, the rules making up the 
rule set are in the form of "if . . . men" hypotheses, but may 
take other forms as appropriate for the specific application, 
such as a neural network. . 

The extracted rule set may be stored 215 in a knowledge 40 
repository 110 accessible by the server processor 106. The 
rule set may be exported to and executed by other rule-based 
expert systems. In a preferred embodiment, the rule set is 
stared 215 in the knowledge repository 110 together with the 
validated query phrases. 43 

The server processor 106 combines the set of rules and the 
validated query phrases in the knowledge repository 110 to 
extract 216 a predictive modeL The extraction may be 
automatic, whereby the bottom-up component extracts rules 
from the target data set 108, or may be m anual, whereby the 50 
user defines the rule then checks against the target data set 
108 to extract data that supports and data that refutes the 
rule. The predictive model thus extracted may be used 
against other target data sets 108 and by other systems. If 
modules 104 or data sources 114 are added to the system, the 55 
models may be retrieved from the knowledge repository 110 
and applied or validated by those modules against new target 
data sets 108. 

The present invention may be applied in a variety of 
embodiments, each of which depends on the types of data 60 
analysis modules 104 and data sources 114 made available 
to the system. Turning now to FIG. 4, mat shows an 
embodiment of the present invention in a relational database 
environment. In that illustrated embodiment, a user selects 
302 a data base from a listing of databases provided at the 65 
interface 102. The server connects 304 to the selected 
database, then extracts 306 the schema of the selected 



database, typically including tables containing a variety of 
attributes and presents the schema to the user at the interface 
102. The user examines 308 each table in the schema 
through the user interface 102. 

The user then takes one of two actions: (1) selects 310 
several tables; or (2) selects 312 a single table to become the 
target data set 108. If more than one table is selected at step 
310, then the target data set 108 may be formed by joining 
314 the selected tables and by further constraining 316 the 
values of the selected tables. The specif! cation for the target 
data set 108 formed in either of these manners is saved 318 
in the server 106. 

An exemplary bottom-up mining aspect of the present 
invention is illustrated by the flow chart of FIG. 5. In that 
example, the user selects 402 the rule induction module 104 
for bottom-up mining. The user then specifies 404 the size 
of a sample from the target data set 108 using the user 
interface 102. The inductive learning module 104' may then 
dlscretize 406 the values of the numeric-valued attributes of 
the target data set 108, and permits the user to specify the 
goal attribute, which attribute is the subject of the end-result 
predictive modeL 

The user next selects 408 the duration of the rule induc- 
tion run, Xe., how long the module will operate against the 
sample data set. Rules are automatically created 410. The 
user may select 412 validated hypotheses from the knowl- 
edge repository 110 that are used in generating 410 the new 
rules. Once the rules are created 410, the user may inspect 
414 and edit 416 the rules before they are stored 418 in the 
knowledge repository 110 for subsequent use. In addition, 
the rules are tested 420, or validated, against a portion of the 
target data set. An explanation 422 of these test results may 
be presented to the user through the interface 106. The 
validated rules are used to further expedite the rule induction 
process and improve the quality of the induced rules. 

In a similar manner, other modules 104 may be used by 
the system and in practicing the present invention. The 
above description is included to illustrate the operation of 
the preferred embodiments and is not meant to limit the 
scope of the invention. The scope of the invention is to be 
limited only by the following claims. From the above 
discussion, many variations will be apparent to one skilled 
in the art that would yet be encompassed by the spirit and 
scope of the invention. 

What is claimed is: 

1. A data inining method for generating predictive models 
in a computer system, said computer system comprising: 
a user interface; 
at least one data source; 

at least one top-down data analysis module and at least 
one bottom-up data analysis module in cooperative 
communication, with each other and in communication 
with the user interface, where the top-down data analy- 
sis module considers data supporting and refuting a 
piece of expressed knowledge, validates or invalidates 
the knowledge, and gives reasons for the validity or 
invalidity of the knowledge, and the botzoro-up data 
analysis module discovers knowledge in data; and 

a server processor, in communication with each data 
source and with the data analysis modules; 

the method comprising the steps of: 
selecting data from at least one data source; 
constructing a target data set from the data selected 

from the data source(s); 
extracting a predictive model using at least one of the 
data analysis modules based on the target data set; 
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storing the predictive module for future use; 

generating a knowledge base set, wherein said knowl- 
edge base set includes a set of rules, a validated 
query phrase, and said predictive model; 

selecting the knowledge base set; and 

validating a query phrase against the target data set and 
the knowledge base set; 

wherein the step of extracting a predictive model 
comprises performing at least one process from the 
group of processes consisting of: detecting a collec- 
tion of rules and extracting the collection; formulat- 
ing a set of equations and extracting the set; and 
training a neural network and extracting parameters 
describing the neural network. 

2. A data m iring method for generating predictive models 
in a computer system, said computer system comprising: 

a user interface; 
at least one data source; 

at least one top-down data analysis module and at least 
one bottom-up data analysis module in cooperative 
communication with each other and in communication 
with the user interface, where die top-down data analy- 
sis module considers data supporting and refuting a 
piece of expressed knowledge, validates or invalidates 
the knowledge, and gives reasons for the validity or 
invalidity of the knowledge, and the bottom-up data 
analysis module discovers knowledge in data; and 
a server processor, in communication with each data 
source and with the data analysis modules; the method 
comprising the steps of: 
selecting data from at least one data source; 
constructing a target data set from the data selected 

from the data source(s); 
extracting a predictive model using at least one of the 
data analysis modules based on the target data set; 
storing the predictive module for future use; 
generating a knowledge base set, wherein said knowl- 
edge base set includes a set of rules, a validated 
query phrase, and said predictive model; 
selecting the knowledge base set; and 
validating a query phrase against the target data set and 

the knowledge base set; 
wherein the query phrase comprises a user-defined 
hypothesis, the method further comprising the steps 
of: 

forming the hypothesis, using the data analysis mod- 
ule; 

validating the hypothesis against the target data set; 
and 

storing the validated hypothesis in the repository. 

3. A data mining method far generating predictive models 
in a computer system, said computer system comprising: 

a user interface; 
at least one data source; 

at least one top-down data analysis module and at least 
one bottom-up data analysis module in cooperative 
communication with each other and in communication 
with the user interface, where the top-down data analy- 
sis module considers data supporting and refuting a $o 
piece of expressed knowledge, validates or invalidates 
the knowledge, and gives reasons for the validity co- 
invalidity of the knowledge, and the bottom-up data 
analysis module discovers knowledge in data; and 

a server processor, in communication with each data 65 
source and with the data analysis modules; 

the method comprising the steps of: 
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selecting data from at least one data source; 
constructing a target data set from the data selected 

from the data source(s)i 
extracting a predictive model using at least one of the 

data analysis modules based on the target data set; 

and 

storing the predictive module for future use; 
wherein at least one of the data sources comprises a 

relational database, the method further comprising 

the steps of: 

extracting a schema of data, including tables and 
attributes, from the relational database; 

defining the target data set including at least one 
table, having at least one of the attributes, from the 
schema; 

defining a user query phrase using one of the data 

analysis modules; 
validating a query phrase against the target data set; 
storing the validated query phrase; and 
selectively directing the validated query phrase to the 

server processor. 

4. The method of claim 3, wherein the query phrase 
comprises a user-defined hypothesis, the method further 
comprising the steps of: 

forming a hypothesis, using the top-down data analysis 
module; 

validating the hypothesis against the target data set; and 
storing the validated hypothesis in the repository. 

5. The method of claim 3, wherein the data analysis 
modules include a visualization module, the method further 
comprising the step of generating a visual display of the 
validated query phrase at the user interface. 

6. A data mining method for generating predictive models 
in a computer system, said computer system comprising: 

a user interface; 

at least one data source; 

at least one top-down data analysis module and at least 
one bottom-up data analysis module in cooperative 
communication with each other and in communication 
with the user interface, where the top-down data analy- 
sis module considers data supporting and refuting a 
piece of expressed knowledge, validates or invalidates 
the knowledge, and gives reasons for the validity or 
invalidity of the knowledge, and the bottom-up data 
analysis module discovers knowledge in data; and 

a server processor, in communication with each data 
source and with die data analysis modules; 

the method comprising the steps of: 
selecting data from at least one data source; 
constructing a target data set from the data selected 

from the data sources); 
extracting a predictive model using at least one of the 
data analysis modules based on the target data set; 
and 

storing the predictive module for future use; 
wherein at least one of the data sources comprises a 

relational database, the method further comprising 

the steps of: 

extracting a schema of data, including tables and 
attributes, from the relational database; and 

defining the target data set including at least one 
tabic, having at least one of the attributes, from the 
schema; 

wherein the step of defining the target data set 
Includes the step of joining a plurality of die 
tables. 
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7. A data mining method for generating predictive models 
in a computer system, said computer system comprising: 

a user interface; 

at least one data source; 

at least one top-down data analysis module and at least 5 
one bottom-up data analysis module in cooperative 
communication with each other and in communication 
with the user interface, where the top-down data analy- 
sis module considers data supporting and refuting. A 1Q 
piece of expressed knowledge, validates or invalidates 
the knowledge, and gives reasons for the validity or 
invalidity of the knowledge, and the bottom-up data 
analysis module discovers knowledge in data; and 
a server processor, in communication with each data 15 

source and with the data analysis modules; 
the method comprising the steps of: 
selecting data from at least one data source; 
constructing a target data set from the data selected 

from the data source(s); 20 
extracting a predictive model using at least one of the 

data analysis modules based on the target data set; 

and 

storing the predictive module for future use; 

wherein at least one of the data sources comprises a 25 

relational database, the method further comprising 

the steps of: 

extracting a schema of data, including tables and 
attributes, from the relational database; and 

defining the target data set including at least one 30 
table, having at least one of the attributes, from the 
schema; 

wherein the step of defining the target data set 
includes the step of constraining attributes of a 
selected table. 35 

8. A data mining method for generating predictive models 
in a computer system, said computer system comprising: 
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a user interface; 

at least one data source; 

at least one top-down data analysis module and at least 
one bottom-up data analysis module in cooperative 
communication with each other and in communication 
with the user interface, where the top-down data analy- 
sis module considers data supporting and refuting a 
piece of expressed knowledge, validates or invalidates 
the knowledge, and gives reasons for the validity or 
invalidity of the knowledge, and the bottom-up data 
analysis module discovers knowledge in data; and 
a server processor, in communication with each data 

source and with the data analysis modules; 
the method comprising the steps of: 
selecting data from at least one data source; 
constructing a target data set from the data selected 

from the data sources); 
extracting a predictive model using at least one of the 

data analysis modules based on the target set; and 
storing the predictive module for future use; 
wherein the data analysis modules include an induction 

module, the method further comprising the steps of: 

selecting the induction module using the user inter* 
face; 

altering the target data set using user-specified 

parameters; 
specifying a goal attribute; and 
generating predictive modules in the form of rules. 

9. The method of claim 8, further comprising the step of 
editing the set of rules in accordance with user-specified 
parameters using the user interface. 

10. The method of claim 9, further comprising the step of 
storing the edited set of rules in the repository. 

U. The method of claim 8, further comprising the step of 
testing the set of rules against the altered target data set. 

***** 
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